Load the necessary libraries
library(car) # for regression diagnostics
library(broom) # for tidy output
library(ggfortify) # for model diagnostics
library(sjPlot) # for outputs
library(knitr) # for kable
library(effects) # for partial effects plots
library(emmeans) # for estimating marginal means
library(ggeffects) # for partial effects plots
library(MASS) # for glm.nb
library(MuMIn) # for AICc
library(DHARMa) # for residual diagnostics plots
library(modelr) # for auxillary modelling functions
library(performance) # for residuals diagnostics
library(see) # for plotting residuals
library(patchwork) # for grids of plots
library(tidyverse) # for data wrangling
Loyn (1987) modelled the abundance of forest birds with six predictor variables (patch area, distance to nearest patch, distance to nearest larger patch, grazing intensity, altitude and years since the patch had been isolated).
Regent honeyeater
Format of loyn.csv data file
| ABUND | DIST | LDIST | AREA | GRAZE | ALT | YR.ISOL |
|---|---|---|---|---|---|---|
| .. | .. | .. | .. | .. | .. | .. |
| ABUND | Abundance of forest birds in patch- response variable |
| DIST | Distance to nearest patch - predictor variable |
| LDIST | Distance to nearest larger patch - predictor variable |
| AREA | Size of the patch - predictor variable |
| GRAZE | Grazing intensity (1 to 5, representing light to heavy) - predictor variable |
| ALT | Altitude - predictor variable |
| YR.ISOL | Number of years since the patch was isolated - predictor variable |
The aim of the analysis is to investigate the effects of a range of predictors on the abundance of forest birds.
loyn <- read_csv("../public/data/loyn.csv", trim_ws = TRUE)
glimpse(loyn)
## Rows: 56
## Columns: 7
## $ ABUND <dbl> 5.3, 2.0, 1.5, 17.1, 13.8, 14.1, 3.8, 2.2, 3.3, 3.0, 27.6, 1.8…
## $ AREA <dbl> 0.1, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.…
## $ YR.ISOL <dbl> 1968, 1920, 1900, 1966, 1918, 1965, 1955, 1920, 1965, 1900, 19…
## $ DIST <dbl> 39, 234, 104, 66, 246, 234, 467, 284, 156, 311, 66, 93, 39, 40…
## $ LDIST <dbl> 39, 234, 311, 66, 246, 285, 467, 1829, 156, 571, 332, 93, 39, …
## $ GRAZE <dbl> 2, 5, 5, 3, 5, 3, 5, 5, 4, 5, 3, 5, 2, 1, 5, 5, 3, 3, 3, 2, 2,…
## $ ALT <dbl> 160, 60, 140, 160, 140, 130, 90, 60, 130, 130, 210, 160, 210, …
This is an application of multiple regression. The response variable in this instance is a little awkward in that it appears to be an average of multiple point quadrats rather than a pure count. Central limits theorem suggests that averages should follow a normal distribution and thus we might expect that it is reasonable to model this bird abundance against a Gaussian distribution. However, this has the potential to become problematic since a Gaussian distribution can extend below zero whereas this is clearly not logical for bird abundance. As a result, we might find the at the model can predict bird abundances less than zero.
If this were our own analysis, we might first attempt to model these data against a Gaussian distribution and then explore whether this could lead to negative predictions - it may be that the bird abundances are sufficiently high that the issue of negative predictions is not realised over sensible ranges of the predictors. If it turns out that there is an issue, then we would explore alternatives.
In this case, there is an issue - the model does indeed predict fewer than zero birds for some ranges of some of the predictors. Hence, we will skip straight to the alternatives:
We will explore each of these options:
A scatterplot matrix plots each variable against each other. It is useful to provide boxplots in the diagonals to help explore the distributions.
scatterplotMatrix(~ ABUND + DIST + LDIST + AREA + GRAZE + ALT + YR.ISOL,
data = loyn,
diagonal = list(method = "boxplot")
)
Conclusions:
scatterplotMatrix(~ ABUND + log(DIST) + log(LDIST) + log(AREA) + GRAZE + ALT + YR.ISOL,
data = loyn,
diagonal = list(method = "boxplot")
)
Conclusions:
The GRAZE predictor represents grazing intensity. Note, this was essentially a categorical variable with levels 1 (no grazing) through to 5 (intense grazing). Although we could attempt to model this as a continuous predictor, such an approach would assume that this is a linear scale in which the interval between each successive level is the same. That is, the difference between level 1 and 2 is the same as the difference between 4 and 5 etc. Since, these were categories, such spacing is not guaranteed. It might therefore be better to model this variable as a categorical variable.
Finally, even in the absence of issues relating to the range of each predictor within each level of GRAZE, it might still be advantageous to centre each predictor. Centering will provide computational advantages and also ensure that the intercept has a more useful meaning.
loyn <- loyn %>% mutate(fGRAZE = factor(GRAZE))
As the model now includes both continuous and categorical predictors, unless we include interactions between the categorical variable and each of the continuous predictors, we must also assume that partial slopes of each predictor is the same for each level of the categorical predictor (homogeneity of slopes).
We will explore this with just two of the predictors (DIST and AREA - both on log axes)
p1 <- ggplot(loyn, aes(y = ABUND, x = DIST, color = fGRAZE)) +
geom_smooth(method = "lm") +
scale_x_log10()
p2 <- ggplot(loyn, aes(y = ABUND, x = AREA, color = fGRAZE)) +
geom_smooth(method = "lm") +
scale_x_log10()
p1 + p2
Conclusions:
One thing we could do to address the different ranges is scale (standardise) or re-scale AREA separately within each GRAZE level.
## might need to consider scaling separtely within each group
loyn <- loyn %>%
group_by(fGRAZE) %>%
mutate(
rslAREA = scales::rescale(log(AREA)),
slAREA = scale(log(AREA))
) %>%
ungroup()
ggplot(loyn, aes(y = ABUND, x = slAREA, color = fGRAZE)) +
geom_smooth(method = "lm") #+
ggplot(loyn, aes(y = ABUND, x = rslAREA, color = fGRAZE)) +
geom_smooth(method = "lm") #+
For the current example, we will treat the continuous predictors as if they have satisfied this assumption.
Model formula:
\[ log(y_i) = \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \beta_1 X_1 + ... \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the additive effects of the scaled versions of distance (ln), distance to the nearest large patch (ln), patch area (ln), grazing intensity, year of isolation and altitude on the abundance of forest birds.
loyn.glm <- glm(log(ABUND) ~ scale(log(DIST), scale = FALSE) + scale(log(LDIST), scale = FALSE) +
scale(log(AREA), scale = FALSE) +
fGRAZE + scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
data = loyn, family = gaussian()
)
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ log(\mu_i) = \boldsymbol{\beta} \bf{X_i} \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the additive effects of the scaled versions of distance (ln), distance to the nearest large patch (ln), patch area (ln), grazing intensity, year of isolation and altitude on the abundance of forest birds.
To centre a predictor in R, we can use the scale function. This function is able to centre and scale a vector, or only centre a vector (by indicating that it should not scale). In addition to Centering (and/or scaling if instructed), this function will also store the original mean (and standard deviation) of the original predictor as attributes, thereby facilitating back transformations.
loyn.glm1 <- glm(ABUND ~ scale(log(DIST), scale = FALSE) + scale(log(LDIST), scale = FALSE) +
scale(log(AREA), scale = FALSE) +
fGRAZE + scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
data = loyn, family = gaussian(link = "log")
)
Model formula: \[ y_i = \mathcal{Gamma}(\mu_i, \sigma^2)\\ log(\mu_i) = \beta_0 + \beta_1 X_1 + ... \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the additive effects of the scaled versions of distance (ln), distance to the nearest large patch (ln), patch area (ln), grazing intensity, year of isolation and altitude on the abundance of forest birds.
loyn.glm2 <- glm(ABUND ~ scale(log(DIST), scale = FALSE) + scale(log(LDIST), scale = FALSE) +
scale(log(AREA), scale = FALSE) +
fGRAZE + scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
data = loyn, family = Gamma(link = "log")
)
loyn.glm %>% vif()
## GVIF Df GVIF^(1/(2*Df))
## scale(log(DIST), scale = FALSE) 1.907874 1 1.381258
## scale(log(LDIST), scale = FALSE) 2.228456 1 1.492801
## scale(log(AREA), scale = FALSE) 2.201116 1 1.483616
## fGRAZE 7.925036 4 1.295314
## scale(ALT, scale = FALSE) 1.597400 1 1.263883
## scale(YR.ISOL, scale = FALSE) 3.252591 1 1.803494
## If any terms have df>1, then Generalized VIF calculated
## This is the inflation in size compared to what would be expected if orthogonal
loyn.glm1 %>% vif()
## GVIF Df GVIF^(1/(2*Df))
## scale(log(DIST), scale = FALSE) 1.964737 1 1.401691
## scale(log(LDIST), scale = FALSE) 2.004818 1 1.415916
## scale(log(AREA), scale = FALSE) 2.319905 1 1.523124
## fGRAZE 3.721110 4 1.178512
## scale(ALT, scale = FALSE) 1.527142 1 1.235776
## scale(YR.ISOL, scale = FALSE) 1.555248 1 1.247096
## If any terms have df>1, then Generalized VIF calculated
## This is the inflation in size compared to what would be expected if orthogonal
loyn.glm2 %>% vif()
## GVIF Df GVIF^(1/(2*Df))
## scale(log(DIST), scale = FALSE) 1.907874 1 1.381258
## scale(log(LDIST), scale = FALSE) 2.228456 1 1.492801
## scale(log(AREA), scale = FALSE) 2.201116 1 1.483616
## fGRAZE 7.925036 4 1.295314
## scale(ALT, scale = FALSE) 1.597400 1 1.263883
## scale(YR.ISOL, scale = FALSE) 3.252591 1 1.803494
## If any terms have df>1, then Generalized VIF calculated
## This is the inflation in size compared to what would be expected if orthogonal
loyn.glm %>% autoplot(which = 1:6)
loyn.glm1 %>% autoplot(which = 1:6)
loyn.glm2 %>% autoplot(which = 1:6)
loyn.glm %>% performance::check_model()
loyn.glm1 %>% performance::check_model()
loyn.glm2 %>% performance::check_model()
loyn.resid <- loyn.glm %>% simulateResiduals(plot = TRUE)
loyn.resid1 <- loyn.glm1 %>% simulateResiduals(plot = TRUE)
loyn.resid2 <- loyn.glm2 %>% simulateResiduals(plot = TRUE)
loyn <- loyn %>%
mutate(Resids = loyn.resid$scaledResiduals)
loyn %>%
mutate(
clDIST = scale(log(DIST), scale = FALSE),
clLDIST = scale(log(LDIST), scale = FALSE),
clAREA = scale(log(AREA), scale = FALSE),
cALT = scale(ALT, scale = FALSE),
cYR.ISOL = scale(YR.ISOL, scale = FALSE)
) %>%
dplyr::select(clDIST, clLDIST, clAREA, fGRAZE, cALT, cYR.ISOL, Resids) %>%
pivot_longer(c(clDIST, clLDIST, clAREA, cALT, cYR.ISOL)) %>%
ggplot() +
geom_point(aes(y = Resids, x = value, color = fGRAZE)) +
facet_wrap(~name, scales = "free")
loyn <- loyn %>%
mutate(Resids = loyn.resid1$scaledResiduals)
loyn %>%
mutate(
clDIST = scale(log(DIST), scale = FALSE),
clLDIST = scale(log(LDIST), scale = FALSE),
clAREA = scale(log(AREA), scale = FALSE),
cALT = scale(ALT, scale = FALSE),
cYR.ISOL = scale(YR.ISOL, scale = FALSE)
) %>%
dplyr::select(clDIST, clLDIST, clAREA, fGRAZE, cALT, cYR.ISOL, Resids) %>%
pivot_longer(c(clDIST, clLDIST, clAREA, cALT, cYR.ISOL)) %>%
ggplot() +
geom_point(aes(y = Resids, x = value, color = fGRAZE)) +
facet_wrap(~name, scales = "free")
loyn <- loyn %>%
mutate(Resids = loyn.resid2$scaledResiduals)
loyn %>%
mutate(
clDIST = scale(log(DIST), scale = FALSE),
clLDIST = scale(log(LDIST), scale = FALSE),
clAREA = scale(log(AREA), scale = FALSE),
cALT = scale(ALT, scale = FALSE),
cYR.ISOL = scale(YR.ISOL, scale = FALSE)
) %>%
dplyr::select(clDIST, clLDIST, clAREA, fGRAZE, cALT, cYR.ISOL, Resids) %>%
pivot_longer(c(clDIST, clLDIST, clAREA, cALT, cYR.ISOL)) %>%
ggplot() +
geom_point(aes(y = Resids, x = value, color = fGRAZE)) +
facet_wrap(~name, scales = "free")
## Effects
loyn.glm %>%
plot_model(type = "eff", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## Predictions
loyn.glm %>%
plot_model(type = "pred", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## We can sort of do part of the backtransforms - but only partly..
plot_grid(list(
loyn.glm %>% plot_model(type = "eff", terms = "DIST") + scale_x_log10(),
loyn.glm %>% plot_model(type = "eff", terms = "LDIST") + scale_x_log10(),
loyn.glm %>% plot_model(type = "eff", terms = "AREA") + scale_x_log10(),
loyn.glm %>% plot_model(type = "eff", terms = "fGRAZE"),
loyn.glm %>% plot_model(type = "eff", terms = "ALT"),
loyn.glm %>% plot_model(type = "eff", terms = "YR.ISOL")
))
## Effects
loyn.glm1 %>%
plot_model(type = "eff", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## Predictions
loyn.glm1 %>%
plot_model(type = "pred", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## We can sort of do part of the backtransforms - but only partly..
plot_grid(list(
loyn.glm1 %>% plot_model(type = "eff", terms = "DIST") + scale_x_log10(),
loyn.glm1 %>% plot_model(type = "eff", terms = "LDIST") + scale_x_log10(),
loyn.glm1 %>% plot_model(type = "eff", terms = "AREA") + scale_x_log10(),
loyn.glm1 %>% plot_model(type = "eff", terms = "fGRAZE"),
loyn.glm1 %>% plot_model(type = "eff", terms = "ALT"),
loyn.glm1 %>% plot_model(type = "eff", terms = "YR.ISOL")
))
## Effects
loyn.glm2 %>%
plot_model(type = "eff", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## Predictions
loyn.glm2 %>%
plot_model(type = "pred", show.data = TRUE, dot.size = 0.5) %>%
plot_grid()
## We can sort of do part of the backtransforms - but only partly..
plot_grid(list(
loyn.glm2 %>% plot_model(type = "eff", terms = "DIST") + scale_x_log10(),
loyn.glm2 %>% plot_model(type = "eff", terms = "LDIST") + scale_x_log10(),
loyn.glm2 %>% plot_model(type = "eff", terms = "AREA") + scale_x_log10(),
loyn.glm2 %>% plot_model(type = "eff", terms = "fGRAZE"),
loyn.glm2 %>% plot_model(type = "eff", terms = "ALT"),
loyn.glm2 %>% plot_model(type = "eff", terms = "YR.ISOL")
))
loyn.glm %>%
allEffects(residuals = TRUE) %>%
plot(type = "response")
loyn.glm1 %>%
allEffects(residuals = TRUE) %>%
plot(type = "response")
loyn.glm2 %>%
allEffects(residuals = TRUE) %>%
plot(type = "response")
loyn.glm %>%
ggpredict() %>%
plot(add.data = TRUE, facet = TRUE, jitter = FALSE)
loyn.glm1 %>%
ggpredict() %>%
plot(add.data = TRUE, facet = TRUE, jitter = FALSE)
loyn.glm2 %>%
ggpredict() %>%
plot(add.data = TRUE, facet = TRUE, jitter = FALSE)
loyn.glm %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE)
loyn.glm %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE) +
scale_x_log10()
## unfortunately, it is not currently possible to influence the
## prediction grid so we are stuck with awful looking figures.
loyn.glm1 %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE)
loyn.glm1 %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE) +
scale_x_log10()
## unfortunately, it is not currently possible to influence the
## prediction grid so we are stuck with awful looking figures.
loyn.glm2 %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE)
loyn.glm2 %>%
ggemmeans(~ AREA | fGRAZE) %>%
plot(add.data = TRUE, jitter = FALSE) +
scale_x_log10()
## unfortunately, it is not currently possible to influence the
## prediction grid so we are stuck with awful looking figures.
loyn.glm %>% plot_model(type = "est")
loyn.glm %>% plot_model(type = "est", transform = "exp", show.values = TRUE)
loyn.glm1 %>% plot_model(type = "est")
loyn.glm1 %>% plot_model(type = "est", transform = "exp", show.values = TRUE)
loyn.glm2 %>% plot_model(type = "est")
loyn.glm2 %>% plot_model(type = "est", transform = "exp", show.values = TRUE)
loyn.glm %>% summary()
##
## Call:
## glm(formula = log(ABUND) ~ scale(log(DIST), scale = FALSE) +
## scale(log(LDIST), scale = FALSE) + scale(log(AREA), scale = FALSE) +
## fGRAZE + scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
## family = gaussian(), data = loyn)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.52735 -0.19456 0.05251 0.28999 1.24743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.794641 0.209884 13.315 < 2e-16 ***
## scale(log(DIST), scale = FALSE) 0.048037 0.106225 0.452 0.653237
## scale(log(LDIST), scale = FALSE) 0.049897 0.082636 0.604 0.548933
## scale(log(AREA), scale = FALSE) 0.217187 0.058115 3.737 0.000513 ***
## fGRAZE2 0.232934 0.289494 0.805 0.425177
## fGRAZE3 0.225478 0.263369 0.856 0.396363
## fGRAZE4 0.097774 0.284703 0.343 0.732844
## fGRAZE5 -0.890477 0.425336 -2.094 0.041838 *
## scale(ALT, scale = FALSE) 0.001325 0.002128 0.623 0.536622
## scale(YR.ISOL, scale = FALSE) 0.001710 0.005166 0.331 0.742209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.2953154)
##
## Null deviance: 44.791 on 55 degrees of freedom
## Residual deviance: 13.585 on 46 degrees of freedom
## AIC: 101.6
##
## Number of Fisher Scoring iterations: 2
Conclusions:
loyn.glm1 %>% summary()
##
## Call:
## glm(formula = ABUND ~ scale(log(DIST), scale = FALSE) + scale(log(LDIST),
## scale = FALSE) + scale(log(AREA), scale = FALSE) + fGRAZE +
## scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
## family = gaussian(link = "log"), data = loyn)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -15.9887 -3.3663 -0.7579 4.0166 12.8221
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.092e+00 1.129e-01 27.400 < 2e-16 ***
## scale(log(DIST), scale = FALSE) -1.896e-02 6.008e-02 -0.316 0.75373
## scale(log(LDIST), scale = FALSE) 4.313e-02 4.531e-02 0.952 0.34623
## scale(log(AREA), scale = FALSE) 1.090e-01 3.187e-02 3.421 0.00132 **
## fGRAZE2 3.964e-02 1.490e-01 0.266 0.79134
## fGRAZE3 1.965e-02 1.378e-01 0.143 0.88717
## fGRAZE4 -1.197e-03 1.562e-01 -0.008 0.99392
## fGRAZE5 -1.122e+00 3.436e-01 -3.264 0.00208 **
## scale(ALT, scale = FALSE) -7.436e-05 1.117e-03 -0.067 0.94719
## scale(YR.ISOL, scale = FALSE) -7.133e-04 2.909e-03 -0.245 0.80737
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 42.40928)
##
## Null deviance: 6337.9 on 55 degrees of freedom
## Residual deviance: 1950.8 on 46 degrees of freedom
## AIC: 379.76
##
## Number of Fisher Scoring iterations: 6
Conclusions:
loyn.glm2 %>% summary()
##
## Call:
## glm(formula = ABUND ~ scale(log(DIST), scale = FALSE) + scale(log(LDIST),
## scale = FALSE) + scale(log(AREA), scale = FALSE) + fGRAZE +
## scale(ALT, scale = FALSE) + scale(YR.ISOL, scale = FALSE),
## family = Gamma(link = "log"), data = loyn)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.36313 -0.30529 -0.01334 0.18839 1.20999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.884175 0.183515 15.716 < 2e-16 ***
## scale(log(DIST), scale = FALSE) 0.052006 0.092880 0.560 0.578246
## scale(log(LDIST), scale = FALSE) 0.002178 0.072255 0.030 0.976083
## scale(log(AREA), scale = FALSE) 0.211406 0.050814 4.160 0.000137 ***
## fGRAZE2 0.157832 0.253125 0.624 0.536013
## fGRAZE3 0.242847 0.230281 1.055 0.297130
## fGRAZE4 0.073447 0.248935 0.295 0.769289
## fGRAZE5 -0.776904 0.371900 -2.089 0.042267 *
## scale(ALT, scale = FALSE) 0.001120 0.001860 0.602 0.550250
## scale(YR.ISOL, scale = FALSE) 0.001578 0.004517 0.349 0.728467
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.2257743)
##
## Null deviance: 31.061 on 55 degrees of freedom
## Residual deviance: 11.960 on 46 degrees of freedom
## AIC: 398.17
##
## Number of Fisher Scoring iterations: 7
Conclusions:
loyn.glm %>% confint()
## 2.5 % 97.5 %
## (Intercept) 2.383277032 3.206005503
## scale(log(DIST), scale = FALSE) -0.160160724 0.256234688
## scale(log(LDIST), scale = FALSE) -0.112067311 0.211861575
## scale(log(AREA), scale = FALSE) 0.103283017 0.331091188
## fGRAZE2 -0.334465021 0.800332435
## fGRAZE3 -0.290714978 0.741670896
## fGRAZE4 -0.460233638 0.655781439
## fGRAZE5 -1.724121112 -0.056833429
## scale(ALT, scale = FALSE) -0.002845377 0.005494663
## scale(YR.ISOL, scale = FALSE) -0.008415597 0.011834628
loyn.glm %>%
confint() %>%
exp()
## 2.5 % 97.5 %
## (Intercept) 10.8403690 24.6803037
## scale(log(DIST), scale = FALSE) 0.8520068 1.2920559
## scale(log(LDIST), scale = FALSE) 0.8939841 1.2359768
## scale(log(AREA), scale = FALSE) 1.1088052 1.3924868
## fGRAZE2 0.7157209 2.2262809
## fGRAZE3 0.7477288 2.0994405
## fGRAZE4 0.6311362 1.9266475
## fGRAZE5 0.1783297 0.9447514
## scale(ALT, scale = FALSE) 0.9971587 1.0055098
## scale(YR.ISOL, scale = FALSE) 0.9916197 1.0119049
loyn.glm1 %>% confint()
## 2.5 % 97.5 %
## (Intercept) 2.857373688 3.304173722
## scale(log(DIST), scale = FALSE) -0.137764975 0.103001378
## scale(log(LDIST), scale = FALSE) -0.048993891 0.130399690
## scale(log(AREA), scale = FALSE) 0.048471352 0.172908096
## fGRAZE2 -0.270270322 0.329472512
## fGRAZE3 -0.253401689 0.291835537
## fGRAZE4 -0.328680891 0.304845014
## fGRAZE5 -1.964415830 -0.489194377
## scale(ALT, scale = FALSE) -0.002307881 0.002204718
## scale(YR.ISOL, scale = FALSE) -0.006029603 0.005436731
loyn.glm1 %>%
confint() %>%
exp()
## 2.5 % 97.5 %
## (Intercept) 17.4157277 27.2260360
## scale(log(DIST), scale = FALSE) 0.8713034 1.1084929
## scale(log(LDIST), scale = FALSE) 0.9521869 1.1392837
## scale(log(AREA), scale = FALSE) 1.0496653 1.1887568
## fGRAZE2 0.7631732 1.3902346
## fGRAZE3 0.7761560 1.3388828
## fGRAZE4 0.7198727 1.3564148
## fGRAZE5 0.1402378 0.6131201
## scale(ALT, scale = FALSE) 0.9976948 1.0022072
## scale(YR.ISOL, scale = FALSE) 0.9939885 1.0054515
loyn.glm2 %>% confint()
## 2.5 % 97.5 %
## (Intercept) 2.541390668 3.247318504
## scale(log(DIST), scale = FALSE) -0.136276672 0.231770632
## scale(log(LDIST), scale = FALSE) -0.139092639 0.148026636
## scale(log(AREA), scale = FALSE) 0.104123690 0.319147655
## fGRAZE2 -0.325629378 0.655552869
## fGRAZE3 -0.216533500 0.701556277
## fGRAZE4 -0.396478859 0.558696246
## fGRAZE5 -1.490328207 -0.078310395
## scale(ALT, scale = FALSE) -0.002428384 0.004625928
## scale(YR.ISOL, scale = FALSE) -0.007727843 0.010332742
loyn.glm2 %>%
confint() %>%
exp()
## 2.5 % 97.5 %
## (Intercept) 12.6973164 25.7212759
## scale(log(DIST), scale = FALSE) 0.8726012 1.2608305
## scale(log(LDIST), scale = FALSE) 0.8701474 1.1595438
## scale(log(AREA), scale = FALSE) 1.1097377 1.3759545
## fGRAZE2 0.7220728 1.9262072
## fGRAZE3 0.8053056 2.0168891
## fGRAZE4 0.6726845 1.7483915
## fGRAZE5 0.2252987 0.9246774
## scale(ALT, scale = FALSE) 0.9975746 1.0046366
## scale(YR.ISOL, scale = FALSE) 0.9923019 1.0103863
We can tidy the summary table and express the coefficients on the response scale. Recall that when we do so, the partial slopes are no longer additive. Rather, they represent a multiplicative change. Furthermore, effects greater than 1 represent an increase, whereas effects less than 1 represent an increase. Hence, an effect of 0.5 represents a halving.
Note also that confidence intervals should be interpreted relative to a value of 1 after such a back-transformation.
loyn.glm %>% tidy(conf.int = TRUE, exponentiate = TRUE)
loyn.glm %>% glance()
We can tidy the summary table and express the coefficients on the response scale. Recall that when we do so, the partial slopes are no longer additive. Rather, they represent a multiplicative change. Furthermore, effects greater than 1 represent an increase, whereas effects less than 1 represent an increase. Hence, an effect of 0.5 represents a halving.
Note also that confidence intervals should be interpreted relative to a value of 1 after such a back-transformation.
loyn.glm1 %>% tidy(conf.int = TRUE, exponentiate = TRUE)
loyn.glm1 %>% glance()
We can tidy the summary table and express the coefficients on the response scale. Recall that when we do so, the partial slopes are no longer additive. Rather, they represent a multiplicative change. Furthermore, effects greater than 1 represent an increase, whereas effects less than 1 represent an increase. Hence, an effect of 0.5 represents a halving.
Note also that confidence intervals should be interpreted relative to a value of 1 after such a back-transformation.
loyn.glm2 %>% tidy(conf.int = TRUE, exponentiate = TRUE)
loyn.glm2 %>% glance()
The magnitude of (partial) slopes are dependent on the scale of each of the underlying predictors. Therefore, we cannot glean the relative importance of each of the predictors relative to each other simply by comparing their partial slopes. We could refit the model including scaled (and centred) versions of the predictors - this would put all partial slopes on a common scale thereby permitting comparisons. However, this would then make the individual partial slopes difficult to interpret with respect to the observed predictor scale. For example, what does a change of one unit on a scaled scale mean in real world measurements?
The alternative is to fit the model with unscaled (yet centred) predictors (as we have done) and then standardise the partial slopes by multiplication with the respective predictor standard deviations divided by the standard deviation of the response. This can be performed by the std.coef() function in the MuMIn package. This function can also correct for a certain degree of (multi)collinearity by standardising by partial standard deviation.
loyn.glm %>% std.coef(partial.sd = TRUE)
## Estimate* Std. Error* df
## (Intercept) 0.000000 0.000000 46
## scale(log(DIST), scale = FALSE) 0.035846 0.079267 46
## scale(log(LDIST), scale = FALSE) 0.047863 0.079267 46
## scale(log(AREA), scale = FALSE) 0.296235 0.079267 46
## fGRAZE2 0.063780 0.079267 46
## fGRAZE3 0.067863 0.079267 46
## fGRAZE4 0.027222 0.079267 46
## fGRAZE5 -0.165953 0.079267 46
## scale(ALT, scale = FALSE) 0.049352 0.079267 46
## scale(YR.ISOL, scale = FALSE) 0.026231 0.079267 46
The magnitude of (partial) slopes are dependent on the scale of each of the underlying predictors. Therefore, we cannot glean the relative importance of each of the predictors relative to each other simply by comparing their partial slopes. We could refit the model including scaled (and centred) versions of the predictors - this would put all partial slopes on a common scale thereby permitting comparisons. However, this would then make the individual partial slopes difficult to interpret with respect to the observed predictor scale. For example, what does a change of one unit on a scaled scale mean in real world measurements?
The alternative is to fit the model with unscaled (yet centred) predictors (as we have done) and then standardise the partial slopes by multiplication with the respective predictor standard deviations divided by the standard deviation of the response. This can be performed by the std.coef() function in the MuMIn package. This function can also correct for a certain degree of (multi)collinearity by standardising by partial standard deviation.
loyn.glm1 %>% std.coef(partial.sd = TRUE)
## Estimate* Std. Error* df
## (Intercept) 0.00000000 0.00000000 46
## scale(log(DIST), scale = FALSE) -0.01394345 0.04418064 46
## scale(log(LDIST), scale = FALSE) 0.04361398 0.04582768 46
## scale(log(AREA), scale = FALSE) 0.14486900 0.04234351 46
## fGRAZE2 0.01112516 0.04180671 46
## fGRAZE3 0.00625722 0.04385612 46
## fGRAZE4 -0.00034329 0.04478860 46
## fGRAZE5 -0.41005751 0.12563578 46
## scale(ALT, scale = FALSE) -0.00283343 0.04254895 46
## scale(YR.ISOL, scale = FALSE) -0.01582719 0.06454024 46
The magnitude of (partial) slopes are dependent on the scale of each of the underlying predictors. Therefore, we cannot glean the relative importance of each of the predictors relative to each other simply by comparing their partial slopes. We could refit the model including scaled (and centred) versions of the predictors - this would put all partial slopes on a common scale thereby permitting comparisons. However, this would then make the individual partial slopes difficult to interpret with respect to the observed predictor scale. For example, what does a change of one unit on a scaled scale mean in real world measurements?
The alternative is to fit the model with unscaled (yet centred) predictors (as we have done) and then standardise the partial slopes by multiplication with the respective predictor standard deviations divided by the standard deviation of the response. This can be performed by the std.coef() function in the MuMIn package. This function can also correct for a certain degree of (multi)collinearity by standardising by partial standard deviation.
loyn.glm2 %>% std.coef(partial.sd = TRUE)
## Estimate* Std. Error* df
## (Intercept) 0.0000000 0.0000000 46
## scale(log(DIST), scale = FALSE) 0.0388078 0.0693088 46
## scale(log(LDIST), scale = FALSE) 0.0020892 0.0693088 46
## scale(log(AREA), scale = FALSE) 0.2883498 0.0693088 46
## fGRAZE2 0.0432165 0.0693088 46
## fGRAZE3 0.0730909 0.0693088 46
## fGRAZE4 0.0204491 0.0693088 46
## fGRAZE5 -0.1447868 0.0693088 46
## scale(ALT, scale = FALSE) 0.0417113 0.0693088 46
## scale(YR.ISOL, scale = FALSE) 0.0242087 0.0693088 46
From this point on, we will only proceed with the Gaussian (log-link) model.
loyn.glm1 %>% MuMIn::r.squaredLR()
## [1] 0.6922071
## attr(,"adj.r.squared")
## [1] 0.6925654
loyn.glm1 %>% AIC()
## [1] 379.7562
loyn.glm1 %>% AICc()
## [1] 385.7562
So far, the model that we have fit includes six predictors. If we had more predictors, providing the predictors were not correlated to any other predictor, we could add additional predictors in an attempt to further reduce the unexplained variance. However, as we do so, the complexity of the model increases (particularly if we start introducing interactions).
If we keep in mind that linear models are intentionally low dimensional representations that are intended to provide insights into the effects of major drivers and are not intended to reconstruct the full system complexity, we should realise that proposing overly complex models goes contrary to these intentions. Even the model we already have might be considered overly complex. Some of the predictors were found to have very little impact on the response, so a far question might be whether all the predictors really are necessary.
It is possible to fit all combinations of models containing the predictors and then comparing the fit of each of the models via information criterion. This is referred to as dredging.
# Option 1 - dredge
loyn.glm1 <- loyn.glm1 %>% update(na.action = na.fail)
# options(width=1000)
loyn.glm1 %>% dredge(rank = "AICc")
Conclusions:
(Intercept) column of the output.fGRAZEscale(log(AREA))We might consider the model with the lowest AICc to be the ‘best’ model. Any other models that are within 2 units would be considered equivalent and thus we might consider the ‘best’ model to be the model with the fewest used degrees of freedom (simplicity) that is within 2 units of the model with the lowest AICc. In this case, the model with the lowest AICc is also the most simple of the top ranking models - and thus the choice might seem clear.
The problem with this approach is that mathematically, one model must bubble up to be the ‘best’ model. This model is not necessarily the most sensible or appropriate model, it is just the model that the data supports most strongly.
It is also a form of ‘p-hacking’ in which an exhaustive set of models are fit in order to find something that is ‘significant’. Given that the probability of a type I error (finding a significant effect when it does not occur) for any single model is 0.05 (5%) and that this compounds with multiple tests, the ‘best’ model might actually be a type I error.
When we run the full model, the partial effects are the estimated effects of each predictor, holding the remaining predictors constant. In this way, each of the partial effects is an effect standardising across all the other predictors. This means that the value (and uncertainty) of any estimated partial slope is partly dependent on which other predictors are present in the model.
Given this, it might be interesting to explore a range of models and use the average of their parameter estimates. This is called model averaging. Rather than average across all models, we instead only average across models with reasonable fit. We can consider well fitting models as those that are within (for example) 4 units of AICc. Furthermore, rather than simply average all the models together equally, we can weight the averages by their relative AICc such that models with lower AICc have higher weights.
## 2. model averaging
loyn.av <- loyn.glm1 %>%
dredge(rank = "AICc") %>%
model.avg(subset = delta <= 4)
loyn.av %>% summary()
##
## Call:
## model.avg(object = ., subset = delta <= 4)
##
## Component model call:
## glm(formula = ABUND ~ <5 unique rhs>, family = gaussian(link = "log"),
## data = loyn, na.action = na.fail)
##
## Component models:
## df logLik AICc delta weight
## 13 7 -179.54 375.41 0.00 0.44
## 135 8 -178.97 377.01 1.59 0.20
## 123 8 -179.43 377.93 2.51 0.13
## 134 8 -179.51 378.08 2.67 0.12
## 136 8 -179.54 378.14 2.73 0.11
##
## Term codes:
## fGRAZE scale(ALT, scale = FALSE)
## 1 2
## scale(log(AREA), scale = FALSE) scale(log(DIST), scale = FALSE)
## 3 4
## scale(log(LDIST), scale = FALSE) scale(YR.ISOL, scale = FALSE)
## 5 6
##
## Model-averaged coefficients:
## (full average)
## Estimate Std. Error Adjusted SE z value
## (Intercept) 3.078e+00 9.609e-02 9.849e-02 31.253
## fGRAZE2 4.550e-02 1.311e-01 1.344e-01 0.339
## fGRAZE3 2.213e-02 1.224e-01 1.254e-01 0.176
## fGRAZE4 -1.761e-02 1.482e-01 1.519e-01 0.116
## fGRAZE5 -1.047e+00 2.919e-01 2.992e-01 3.499
## scale(log(AREA), scale = FALSE) 1.163e-01 2.727e-02 2.794e-02 4.162
## scale(log(LDIST), scale = FALSE) 7.184e-03 2.162e-02 2.193e-02 0.328
## scale(ALT, scale = FALSE) -5.399e-05 3.744e-04 3.826e-04 0.141
## scale(log(DIST), scale = FALSE) 1.443e-03 1.781e-02 1.824e-02 0.079
## scale(YR.ISOL, scale = FALSE) -7.294e-06 9.374e-04 9.612e-04 0.008
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## fGRAZE2 0.734893
## fGRAZE3 0.859954
## fGRAZE4 0.907685
## fGRAZE5 0.000467 ***
## scale(log(AREA), scale = FALSE) 3.16e-05 ***
## scale(log(LDIST), scale = FALSE) 0.743195
## scale(ALT, scale = FALSE) 0.887770
## scale(log(DIST), scale = FALSE) 0.936950
## scale(YR.ISOL, scale = FALSE) 0.993945
##
## (conditional average)
## Estimate Std. Error Adjusted SE z value
## (Intercept) 3.078e+00 9.609e-02 9.849e-02 31.253
## fGRAZE2 4.550e-02 1.311e-01 1.344e-01 0.339
## fGRAZE3 2.213e-02 1.224e-01 1.254e-01 0.176
## fGRAZE4 -1.761e-02 1.482e-01 1.519e-01 0.116
## fGRAZE5 -1.047e+00 2.919e-01 2.992e-01 3.499
## scale(log(AREA), scale = FALSE) 1.163e-01 2.727e-02 2.794e-02 4.162
## scale(log(LDIST), scale = FALSE) 3.594e-02 3.612e-02 3.704e-02 0.970
## scale(ALT, scale = FALSE) -4.275e-04 9.749e-04 9.996e-04 0.428
## scale(log(DIST), scale = FALSE) 1.233e-02 5.076e-02 5.205e-02 0.237
## scale(YR.ISOL, scale = FALSE) -6.439e-05 2.785e-03 2.855e-03 0.023
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## fGRAZE2 0.734893
## fGRAZE3 0.859954
## fGRAZE4 0.907685
## fGRAZE5 0.000467 ***
## scale(log(AREA), scale = FALSE) 3.16e-05 ***
## scale(log(LDIST), scale = FALSE) 0.331901
## scale(ALT, scale = FALSE) 0.668915
## scale(log(DIST), scale = FALSE) 0.812720
## scale(YR.ISOL, scale = FALSE) 0.982007
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
loyn.av %>% confint()
## 2.5 % 97.5 %
## (Intercept) 2.885196162 3.271288353
## fGRAZE2 -0.217834427 0.308825884
## fGRAZE3 -0.223690009 0.267946351
## fGRAZE4 -0.315238438 0.280020670
## fGRAZE5 -1.633107301 -0.460397973
## scale(log(AREA), scale = FALSE) 0.061512686 0.171038037
## scale(log(LDIST), scale = FALSE) -0.036656378 0.108533548
## scale(ALT, scale = FALSE) -0.002386552 0.001531653
## scale(log(DIST), scale = FALSE) -0.089679093 0.114340744
## scale(YR.ISOL, scale = FALSE) -0.005660513 0.005531723
Conclusions:
Arguably, if our intentions are to gain insights into what influences bird abundances in fragmented forests, rather than fit a model with a large range of predictors, it might be better to fit a number of models, each of which focuses on a different aspect of the avian ecology.
Rather than search for the single ‘best’ model, we could propose a small set of ecologically plausible candidate models and then see which ones are supported by the data. For example, in the current context of birds in fragmented forests, we could propose the following models:
Note, these are all models that could be proposed prior to even collecting the data and all can be explained ecologically. By contrast, dredging by chance could suggest a model that has a combination of predictors that are very difficult to interpret and explain.
As the above models contain fewer predictors, there is now scope to include interactions.
loyn.glm1a <- loyn.glm1 %>% update(. ~ scale(log(DIST), scale = FALSE) * scale(log(LDIST), scale = FALSE))
loyn.glm1b <- loyn.glm1 %>% update(. ~ scale(log(AREA), scale = FALSE) * fGRAZE)
loyn.glm1c <- loyn.glm1 %>% update(. ~ scale(log(AREA), scale = FALSE) * fGRAZE * scale(YR.ISOL, scale = FALSE))
loyn.glm1d <- loyn.glm1 %>% update(. ~ scale(ALT, scale = FALSE))
loyn.null <- loyn.glm1 %>% update(. ~ 1)
AICc(loyn.glm1a, loyn.glm1b, loyn.glm1c, loyn.glm1d, loyn.null)
AICc(loyn.glm1a, loyn.glm1b, loyn.glm1c, loyn.glm1d, loyn.null) %>%
mutate(Model = row.names(.), delta = AICc - AICc[5]) %>%
dplyr::select(Model, everything())
## Support for model 2, 3 and 4, no support for model 1
Conclusions:
loyn.glm1b %>% summary()
##
## Call:
## glm(formula = ABUND ~ scale(log(AREA), scale = FALSE) + fGRAZE +
## scale(log(AREA), scale = FALSE):fGRAZE, family = gaussian(link = "log"),
## data = loyn, na.action = na.fail)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -15.9012 -2.4444 0.1277 2.6674 11.4148
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.24321 0.08978 36.126 < 2e-16
## scale(log(AREA), scale = FALSE) 0.05475 0.03097 1.768 0.083733
## fGRAZE2 -0.15645 0.13308 -1.176 0.245792
## fGRAZE3 -0.15315 0.11314 -1.354 0.182452
## fGRAZE4 -0.23802 0.15036 -1.583 0.120291
## fGRAZE5 -1.00394 0.24264 -4.138 0.000148
## scale(log(AREA), scale = FALSE):fGRAZE2 0.12339 0.06617 1.865 0.068608
## scale(log(AREA), scale = FALSE):fGRAZE3 0.15174 0.06501 2.334 0.024019
## scale(log(AREA), scale = FALSE):fGRAZE4 0.43933 0.22471 1.955 0.056659
## scale(log(AREA), scale = FALSE):fGRAZE5 0.31603 0.18885 1.673 0.101027
##
## (Intercept) ***
## scale(log(AREA), scale = FALSE) .
## fGRAZE2
## fGRAZE3
## fGRAZE4
## fGRAZE5 ***
## scale(log(AREA), scale = FALSE):fGRAZE2 .
## scale(log(AREA), scale = FALSE):fGRAZE3 *
## scale(log(AREA), scale = FALSE):fGRAZE4 .
## scale(log(AREA), scale = FALSE):fGRAZE5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 32.33763)
##
## Null deviance: 6337.9 on 55 degrees of freedom
## Residual deviance: 1487.5 on 46 degrees of freedom
## AIC: 364.57
##
## Number of Fisher Scoring iterations: 7
Conclusions:
We do know that the partial slope associated with Area for Grazing level 3 is different to that associated with Grazing level 1, yet we don’t directly know what the slope associated with Grazing level 1 is. Furthermore, we do not know how it compares to the slopes associated with other Grazing levels. We can use the ‘emtrends()’ function to explore this further.
loyn.glm1b %>% emtrends(pairwise ~ fGRAZE, var = "log(AREA)")
## $emtrends
## fGRAZE log(AREA).trend SE df lower.CL upper.CL
## 1 0.0547 0.0310 46 -0.00759 0.117
## 2 0.1781 0.0585 46 0.06043 0.296
## 3 0.2065 0.0572 46 0.09143 0.322
## 4 0.4941 0.2226 46 0.04609 0.942
## 5 0.3708 0.1863 46 -0.00421 0.746
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1 - 2 -0.1234 0.0662 46 -1.865 0.3505
## 1 - 3 -0.1517 0.0650 46 -2.334 0.1528
## 1 - 4 -0.4393 0.2247 46 -1.955 0.3039
## 1 - 5 -0.3160 0.1888 46 -1.673 0.4600
## 2 - 3 -0.0284 0.0818 46 -0.347 0.9968
## 2 - 4 -0.3159 0.2301 46 -1.373 0.6477
## 2 - 5 -0.1926 0.1953 46 -0.987 0.8600
## 3 - 4 -0.2876 0.2298 46 -1.252 0.7215
## 3 - 5 -0.1643 0.1949 46 -0.843 0.9156
## 4 - 5 0.1233 0.2902 46 0.425 0.9930
##
## P value adjustment: tukey method for comparing a family of 5 estimates
loyn.glm1b %>% emtrends(pairwise ~ fGRAZE, var = "AREA")
## $emtrends
## fGRAZE AREA.trend SE df lower.CL upper.CL
## 1 0.00078 0.000441 46 -0.000108 0.00167
## 2 0.00254 0.000834 46 0.000861 0.00422
## 3 0.00294 0.000815 46 0.001303 0.00458
## 4 0.00704 0.003173 46 0.000657 0.01343
## 5 0.00529 0.002656 46 -0.000060 0.01063
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1 - 2 -0.001759 0.000943 46 -1.865 0.3505
## 1 - 3 -0.002163 0.000927 46 -2.334 0.1528
## 1 - 4 -0.006263 0.003203 46 -1.955 0.3039
## 1 - 5 -0.004505 0.002692 46 -1.673 0.4600
## 2 - 3 -0.000404 0.001166 46 -0.347 0.9968
## 2 - 4 -0.004504 0.003280 46 -1.373 0.6477
## 2 - 5 -0.002746 0.002783 46 -0.987 0.8600
## 3 - 4 -0.004100 0.003276 46 -1.252 0.7215
## 3 - 5 -0.002342 0.002778 46 -0.843 0.9156
## 4 - 5 0.001758 0.004137 46 0.425 0.9930
##
## P value adjustment: tukey method for comparing a family of 5 estimates
Conclusions:
loyn.glm1b %>% emmeans(pairwise ~ fGRAZE)
## $emmeans
## fGRAZE emmean SE df lower.CL upper.CL
## 1 3.36 0.0552 46 3.25 3.47
## 2 3.46 0.1161 46 3.23 3.69
## 3 3.52 0.1242 46 3.27 3.77
## 4 4.04 0.4083 46 3.22 4.86
## 5 3.01 0.4769 46 2.06 3.97
##
## Results are given on the log (not the response) scale.
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1 - 2 -0.1017 0.129 46 -0.791 0.9319
## 1 - 3 -0.1643 0.136 46 -1.209 0.7460
## 1 - 4 -0.6811 0.412 46 -1.653 0.4722
## 1 - 5 0.3428 0.480 46 0.714 0.9522
## 2 - 3 -0.0626 0.170 46 -0.368 0.9959
## 2 - 4 -0.5794 0.424 46 -1.365 0.6525
## 2 - 5 0.4445 0.491 46 0.906 0.8933
## 3 - 4 -0.5168 0.427 46 -1.211 0.7450
## 3 - 5 0.5071 0.493 46 1.029 0.8406
## 4 - 5 1.0239 0.628 46 1.631 0.4859
##
## Results are given on the log (not the response) scale.
## P value adjustment: tukey method for comparing a family of 5 estimates
Since our summary figure will feature both modelled predictors, we might as well overlay the trends onto the raw data.
## loyn_grid <- loyn.glm1b %>%
## insight::get_data() %>%
## modelr::data_grid(AREA=modelr::seq_range(AREA, n=10)) %>%
## as.list()
## ref_grid(loyn.glm1b, cov.reduce=function(x) seq_range(x, n=10)) %>%
## emmeans(~AREA|fGRAZE)
## Using emmeans
loyn.grid <- with(loyn, list(
fGRAZE = levels(fGRAZE),
AREA = seq(min(AREA), max(AREA), len = 100)
))
## OR
loyn.grid <- with(loyn, list(
fGRAZE = levels(fGRAZE),
AREA = seq_range(AREA, n = 100)
))
newdata <- loyn.glm1b %>%
emmeans(~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
head(newdata)
ggplot(newdata, aes(y = response, x = AREA, color = fGRAZE, fill = fGRAZE)) +
geom_point(data = loyn, aes(y = ABUND, color = fGRAZE)) +
## geom_ribbon(aes(ymin=asymp.LCL, ymax=asymp.UCL), color=NA, alpha=0.3) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), color = NA, alpha = 0.3) +
geom_line() +
scale_x_log10(labels = scales::comma) +
scale_y_log10("Abundance", breaks = as.vector(c(1, 2, 5, 10) %o% 10^(0:2))) +
theme_classic()
If we want to limit the range of predictions within each Grazing level…
loyn.grid <- loyn %>%
filter(fGRAZE == 1) %>%
with(list(fGRAZE = "1", AREA = seq_range(AREA, n = 100)))
newdata.1 <- emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
loyn.grid <- loyn %>%
filter(fGRAZE == 2) %>%
with(list(fGRAZE = "2", AREA = seq_range(AREA, n = 100)))
newdata.2 <- emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
loyn.grid <- loyn %>%
filter(fGRAZE == 3) %>%
with(list(fGRAZE = "3", AREA = seq_range(AREA, n = 100)))
newdata.3 <- emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
loyn.grid <- loyn %>%
filter(fGRAZE == 4) %>%
with(list(fGRAZE = "4", AREA = seq_range(AREA, n = 100)))
newdata.4 <- emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
loyn.grid <- loyn %>%
filter(fGRAZE == 5) %>%
with(list(fGRAZE = "5", AREA = seq_range(AREA, n = 100)))
newdata.5 <- emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = loyn.grid, type = "response") %>%
as.data.frame()
newdata.1 %>%
bind_rows(newdata.2) %>%
bind_rows(newdata.3) %>%
bind_rows(newdata.4) %>%
bind_rows(newdata.5) %>%
ggplot(aes(y = response, x = AREA, color = fGRAZE, fill = fGRAZE)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), color = NA, alpha = 0.3) +
geom_line() +
scale_x_log10(labels = scales::comma) +
scale_y_log10("Abundance") +
theme_classic() +
scale_color_viridis_d() +
scale_fill_viridis_d() +
geom_point(data = loyn, aes(y = ABUND, color = fGRAZE))
loyn %>%
group_by(fGRAZE) %>%
nest() %>%
mutate(
Grid = map2(.x = data, .y = fGRAZE, ~ list(fGRAZE = unique(.y), AREA = seq_range(.x$AREA, n = 100))),
Pred = map(.x = Grid, ~ emmeans(loyn.glm1b, ~ AREA | fGRAZE, at = .x, type = "response") %>% as.data.frame())
) %>%
ungroup() %>%
dplyr::select(Pred) %>%
unnest(Pred) %>%
ggplot(aes(y = response, x = AREA, color = fGRAZE, fill = fGRAZE)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), color = NA, alpha = 0.3) +
geom_line() +
scale_x_log10(labels = scales::comma) +
scale_y_log10("Abundance") +
theme_classic() +
scale_color_viridis_d() +
scale_fill_viridis_d() +
geom_point(data = loyn, aes(y = ABUND, color = fGRAZE))
loyn.grid <- with(loyn, list(
fGRAZE = levels(fGRAZE),
AREA = as.vector(seq(1, 10, len = 10) %o% 10^(-1:3)) %>% unique()
))
newdata <- emmeans(loyn.glm1b, ~ AREA | fGRAZE,
at = loyn.grid,
type = "response"
) %>%
as.data.frame()
head(newdata)
loyn.limits <- loyn %>%
group_by(fGRAZE) %>%
summarise(Min = min(AREA), Max = max(AREA))
newdata1 <- newdata %>%
full_join(loyn.limits) %>%
filter(AREA >= Min, AREA <= Max)
head(newdata1)
ggplot(newdata1, aes(y = response, x = AREA, color = fGRAZE, fill = fGRAZE)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), color = NA, alpha = 0.3) +
geom_line() +
scale_x_log10(labels = scales::comma) +
scale_y_log10("Abundance") +
theme_classic() +
scale_color_viridis_d() +
scale_fill_viridis_d() +
geom_point(data = loyn, aes(y = ABUND, color = fGRAZE))